InsightSwarm: A Multi-Agent Adversarial Framework for Automated Fact-Checking with Real-Time Source Verification, Human-in-the-Loop Oversight, and Adaptive Confidence Calibration

Authors: Soham Gawas , Bhargav Ghawali, Mahesh Gawali, Ayush Devadiga, Shital Gujar

DOI Link: https://doi.org/10.22214/ijraset.2026.79918

Abstract

The rapid spread of misinformation on social and digital media demands automated fact-checking systems that are accurate, calibrated, and transparent. Existing approaches — single large-language-model (LLM) classifiers and rule-based systems — suffer from source hallucination rates of 15 to 30 percent and provide no visibility into their reasoning process. We present InsightSwarm, a production-grade multi-agent fact-checking system built on five concrete contributions: (1) adversarial debate between role-locked ProAgent and ConAgent, each backed by real-time web source retrieval; (2) a multi-layer FactChecker pipeline that independently fetches and validates every cited URL, reducing source hallucination to below 3 percent; (3) Human-in-the-Loop (HITL) intervention via LangGraph interrupt semantics enabling mid-pipeline human source correction through a live React panel; (4) adaptive confidence calibration using geometric-mean source trust scoring to correct systematic underconfidence; and (5) claim complexity estimation that dynamically adjusts debate depth and resource allocation. Evaluated on a 100-claim FEVER-derived benchmark, InsightSwarm achieves an F1 score of 0.81 versus 0.68 for a zero-shot LLM baseline and 0.56 for a keyword baseline. The full system is open-source and available at https://github.com/AyushDevadiga1/Insight-Swarm.

Introduction

Misinformation spreads quickly online, while manual fact-checking is slow and single LLM systems suffer from hallucinations and false citations. InsightSwarm addresses these issues by combining adversarial multi-agent reasoning (ProAgent vs ConAgent) with a FactChecker that validates every cited URL in real time, ensuring claims are grounded in actual web evidence. It is built as a low-cost, fully reproducible system using free-tier APIs.

The system architecture includes:

A FastAPI backend + React frontend
A LangGraph-based multi-agent debate pipeline
A FactChecker that detects both missing and misleading (Type I & II) hallucinations
A Moderator that produces final verdicts using trust-weighted scoring
A semantic cache and API failover system

Key innovations include:

Real-time per-URL verification (including detecting fake support from real pages)
Multi-agent adversarial debate grounded in live web evidence
Human-in-the-loop correction during processing
Adaptive confidence calibration for more reliable judgments
Claim complexity estimation to optimize computational effort

The system was developed over 25 days, growing from a small prototype into a 15,600-line production system with extensive testing and modular architecture.

In evaluation on a FEVER-based benchmark (100 claims), InsightSwarm outperforms baselines:

F1 score: 0.81 (higher than zero-shot LLM and rule-based systems)
Significantly lower hallucination rate (<3%)
Better calibration and balanced precision/recall

Conclusion

InsightSwarm demonstrates that multi-agent adversarial fact-checking with per-URL source verification, human-in-the-loop oversight, adaptive confidence calibration, and complexity-driven resource allocation achieves F1 = 0.81 — a 19 percent improvement over a strong single-LLM baseline — at zero infrastructure cost. The hallucination rate below 3 percent against a 20 percent baseline validates the structural verification approach over prompt-level heuristics. The 25-day development trajectory from a 400-line prototype to a 15,600-line production system illustrates that principled software engineering — test-driven development, modular architecture, iterative hardening — is as consequential as algorithmic novelty in building trustworthy AI systems. Three directions are planned for future work. First, FAISS-indexed vector retrieval will replace the current linear cache scan, enabling scalable deployment with tens of thousands of cached claims. Second, Celery-based asynchronous task brokering will support multi-user concurrency beyond the current FastAPI synchronous ceiling. Third, an LLM-based fallacy classification head trained on labeled debate transcripts will replace the current regex heuristics in ArgumentationAnalyzer, enabling detection of subtler argumentation failures. Multilingual support — Hindi, Marathi, Tamil, and Bengali — is prioritized for India’s non-English-speaking population where misinformation spreads at the highest rates.

References

[1] NASSCOM, “Internet in India Report 2023,” Internet and Mobile Association of India (IAMAI) and Kantar, New Delhi, India, 2023. [Online]. Available: https://www.iamai.in/research/reports [2] H. Farid, \"Detecting Deepfakes,\" IEEE Signal Processing Magazine, vol. 39, no. 1, pp. 14-23, 2022. [3] J. Maynez et al., \"On Faithfulness and Factuality in Abstractive Summarization,\" in Proc. ACL, 2020. [4] N. Hassan et al., \"ClaimBuster: The First-ever End-to-end Fact-Checking System,\" Proc. VLDB Endow., vol. 10, no. 12, 2017. [5] I. Augenstein et al., \"MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims,\" in Proc. EMNLP, 2019. [6] J. Thorne et al., \"FEVER: A Large-scale Dataset for Fact Extraction and VERification,\" in Proc. NAACL, 2018. [7] S. Min et al., \"FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation,\" in Proc. EMNLP, 2023. [8] Y. Du et al., \"Improving Factuality and Reasoning in Language Models through Multiagent Debate,\" in Proc. ICML, 2023. [9] L. Zhang et al., \"Multi-Agent Systems for Misinformation Detection: A Survey,\" arXiv preprint, 2023. [10] C. Han, W. Zheng, and X. Tang, \"Debate-to-Detect: Reformulating Misinformation Detection as a Real-World Debate with Large Language Models,\" arXiv preprint, 2025. [11] S. Kadavath et al., \"Language Models (Mostly) Know What They Know,\" arXiv:2207.05221, 2022. [12] LangChain, \"LangGraph Documentation,\" 2024. [Online]. Available: https://python.langchain.com/docs/langgraph [13] S. Patel, D. Gupta, and A. Mishra, \"Automated Fact-Checking: A Survey of Methods, Datasets and Evaluation,\" AI Magazine, vol. 45, no. 2, pp. 89-113, 2024.

Copyright

Copyright © 2026 Soham Gawas , Bhargav Ghawali, Mahesh Gawali, Ayush Devadiga, Shital Gujar. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET79918

Publish Date : 2026-04-10

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here